Syntax errors identification from compiler error messages using ML techniques
نویسنده
چکیده
Compiler error messages facilitate software development and debugging by providing cause and location of the error but due to various compiler bugs and inconsistencies it often fails its purpose and negatively affect performance of both novice and experienced programmers. An errant semicolon or brace can result in many errors reported throughout the program. This study tries to statistically analyze open source code base to predict real errors from different type of compiler error messages. It also tries to auto-fix these errors. At the high level, this study handles two cases (1) when one error is present in code, (2) when two different errors are present in the code. We start with collecting different type of random error messages for both the cases by random error generation in C projects. We developed different models using document clustering, probabilistic topic modeling and multi-label classification algorithms for training and predicting real errors using collected error messages for both the cases. Our empirical evaluation on open-source projects has shown that our model correctly predicts the real error in almost 95% cases, when only one error exists in program. In case of two errors, model correctly predicts at least one error in almost 91% cases and both the errors in almost 39% cases.
منابع مشابه
Reverse Engineering of Network Software Binary Codes for Identification of Syntax and Semantics of Protocol Messages
Reverse engineering of network applications especially from the security point of view is of high importance and interest. Many network applications use proprietary protocols which specifications are not publicly available. Reverse engineering of such applications could provide us with vital information to understand their embedded unknown protocols. This could facilitate many tasks including d...
متن کاملIntroducing modified TypeScript in an exis ng framework to im - prove error handling
Error messages in compilers is a topic that is often overlooked. The quality of the messages can have a big impact on development time and ease of learning. Another method used to speed up development is to build a domainspecific language (DSL). This thesis migrates an existing framework to use TypeScript in order to speed up development time with compile-time error handling. Alternative method...
متن کاملEliminating Spurious Error Messages Using Exceptions, Polymorphism, and Higher-Order Functions
Many language processors make assumptions after detecting an error. If the assumptions are invalid, a compiler may issue a cascade of error messages in which only the rst represents a true error in the input; later messages are side eeects of the original error. Eliminating such spurious error messages requires keeping track of values within the compiler that are not available because of a prev...
متن کاملA Concept of Agent-based Learning Support System for C Programming
The programming is one of the most important factors in the education of computer literacy. However, in the initial learning of the programming, many students often have faced a simple problem, like syntax-error by careless or miss typing. Therefore, they feel that the programming is too hard. The most important problem is the student can't take advantage of the error messages with displayed fr...
متن کاملIdentification and assessment of Human Error in Cabin Roofed Crane Using SHERPA and SPAR-H Techniques
Introduction: Human errors play a significant role in the occurrence of industrial accidents. This study aims to investigate the human errors in cabin roof crane operators of a metal industry using SHERPA and SPAR-H techniques. Material and Method: In this research, first, all of the tasks of the tower crane operator were identified and analyzed. Then, adopting SHERPA technique, p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017